An unusual dual sugar-binding lectin domain controls the substrate specificity of a mucin-type O-glycosyltransferase

N-acetylgalactosaminyl-transferases (GalNAc-Ts) initiate mucin-type O-glycosylation, an abundant and complex posttranslational modification that regulates host-microbe interactions, tissue development, and metabolism. GalNAc-Ts contain a lectin domain consisting of three homologous repeats (α, β, and γ), where α and β can potentially interact with O-GalNAc on substrates to enhance activity toward a nearby acceptor Thr/Ser. The ubiquitous isoenzyme GalNAc-T1 modulates heart development, immunity, and SARS-CoV-2 infectivity, but its substrates are largely unknown. Here, we show that both α and β in GalNAc-T1 uniquely orchestrate the O-glycosylation of various glycopeptide substrates. The α repeat directs O-glycosylation to acceptor sites carboxyl-terminal to an existing GalNAc, while the β repeat directs O-glycosylation to amino-terminal sites. In addition, GalNAc-T1 incorporates α and β into various substrate binding modes to cooperatively increase the specificity toward an acceptor site located between two existing O-glycans. Our studies highlight a unique mechanism by which dual lectin repeats expand substrate specificity and provide crucial information for identifying the biological substrates of GalNAc-T1.

The PDF file includes: Supplementary Materials and Methods Figs.S1 to S9 Tables S1 to S5 Legends for data S1 to S3 References Other Supplementary Material for this manuscript includes the following:

Simulations setup
All the simulations were carried out in a cubic cell with cubic periodic boundary conditions using particle-mesh Ewald summations with the all-atom CHARMM (61) (param36) force field.The side length of the simulation box was initially set at ∼12.0 nm and filled with ~56,000 TIP3P water molecules, yielding an average density of ∼0.993 g∕cm 3 after equilibration.In all the complexes, disulfide bonds Cys106-Cys339 and Cys330-Cys408 in the catalytic domain and Cys442-Cys459, Cys482-Cys497, and Cys523-Cys540 in the lectin domain were set at the beginning of the simulations.
Protonation states were assigned based on pKa prediction with the PROPKA program (version 3.0) (62, 63) on the initial structure, all Asp and Glu were negatively charged, and Arg and Lys positively charged; His residues were generally neutral.Parameters for a-D-GalNAc-L-Threonine were created by chemical analogy from similar molecules in the CHARMM parameter files.The Na + and Cl -ions were randomly distributed by replacing water molecules.All bond lengths involving hydrogen atoms were constrained with the SHAKE algorithm, and an integration step of 2 fs was used.The temperature and pressure were maintained with the Hoover thermostat, using a mass of 10 3 kcal mol −1 ps 2 , and with the Langevin piston method, with mass and collision frequency of 400 amu and 20 ps −1 .
The corresponding protein:peptide complexes were modeled through steered MD in a continuum solvent model (64, 65) by applying a gradually increasing harmonic force to the heavy atoms of Thr5-O-GalNAc, Thr13, and Thr25-O-GalNac to bring them close to their relative positions observed in the available crystal structures: Thr5-GalNAc and Thr25-GalNAc near Asp444 and Asp484, respectively, and Thr13 near the catalytic site.
The force constant started at zero and was incremented by 0.1 kcal mol -1 Å -2 every two ns until the heavy-atom RMSD was less than 1 Å; in the process, the protein and target atoms were kept fixed.Additional structural adjustments were observed during the free 30-ns dynamics.

Comparative analysis of WT and mutants MD simulations
The apo-GalNAc-T1 D444A , apo-GalNAc-T1 D484A , and apo-GalNAc-T1 D444A/D484A systems were created from the apo-GalNAc-T1 WT by replacing the corresponding residues in the equilibrated WT conformation.Six metrics were used to assess the changes elicited by the mutations: local side-chain flexibility, hydrophobic/nonpolar interaction networks, H-bond/salt-bridge interaction networks, local backbone conformational changes, and long-range (Pearson's and distance) cross-correlation of side-chain and backbone Ca motions.These quantities are sensitive to thermodynamic conditions and mutations, thus suitable for detecting subtle structural and dynamic changes in comparative analysis.All the calculations were performed with the CHARMM program using the last 20 ns of productive simulation.The values of these metrics were projected on the fourth parameter (B-factor) of the corresponding representative coordinates (PDB format) and visualized as heatmaps in ChimeraX (66).Scripts, structures and ChimeraX sessions are available in Data S2.
The solvent A was 0.1% formic acid in HPLC water, and B was 80% acetonitrile/0.1% formic acid.The LC-MS method duration was 90 min.The LC flow rate was 0.25 μL/min with a linear gradient of 4-40% solvent B over 6-72 min followed by 40-95% B 72-77 min, wash in 95% B 77-82 min, and finally, equilibrated at 1% B for 82-90 min.The spray voltage was positive ion at 1800 V with an ion transfer tube temperature of 250°C.Internal mass calibration was EASY-IC.Advanced peak determination was true.Orbitrap at a resolution of 120K was used to detect precursor masses with a scan range of 250-1800 m/z.The duty cycle was 2 s.Maximum injection time was 50 ms with an AGC target of 400K.RF lens was 30%.Charge states of 2-8 were selected for HCD fragmentation with a collision energy of 30%, a resolution of 30K, a maximum injection time of 60   against Muc5AC-A peptides, with the acceptor Thr in red and GalNAc shown in a yellow square.
ms, an AGC target of 50K, an isolation window of 2, first mass at 110 m/z, and dynamic exclusion of 60 s.If HCD fragmentation generated oxonium ions at 126.055, 138.0549, 144.0655, 168.0654, 186.076, 204.0865, 274.0921, 292.1027, and 366.1395 m/z were detected in the top 20 product ions within 15 ppm, EThcD fragmentation was triggered and acquired in the orbitrap with a collision energy of 30%, a resolution of 30K, a maximum injection time of 200 ms, an AGC target of 100K, a SA collision energy of 35%, ETD reagent target 500K, max ETD reagent injection time of 200 ms, and first mass at 110 m/z.ETD reaction time was 125 ms for charge 2, 100 ms for charge 3, and 75 ms for ≥ charge 4. LC-MS/MS data analysis for pinpointing glycosylation site Software packages pGlyco3 (46) (release date 2021-06-15) were used to identify glycopeptides.The Muc1 peptide sequence was used for database search.Variable modifications were oxidation (M), HexNAc (S), and HexNAc (T).Carbamidomethylation (C) was the static modification.No enzyme digestion was used in the search.The HCD + ETHCD search mode was selected.The glycopeptide FDR was 0.01.The output data from pGlyco3 were filtered to keep peptide-spectrum matches (PSMs) having HexNAc in the Glycan Composition column.PSMs with the highest EThcD O-glycosite mapping score of at least 0.75 on any serine or threonine residues of peptide sequences were kept.The M/Z and intensity of the peaks in PSMs identified by the pGlyco3 were then exported and filtered to have an intensity greater than 5000.Using these filtered values, they were then imported into the interactive peptide spectra annotator (https://www.interactivepeptidespectralannotator.com/PeptideAnnotator.html)for manual confirmation of peaks corresponding to b and y as well as c and z ions.

Fig
Fig. S4.(A) Superposition of the b repeat of GalNAc-T1 (dark colors) and the a repeat of GalNAc-T2 (light colors) showing that GalNAc binding residues adopt a similar conformation to interact with Thr-O-GalNAc.(B) The lectin repeats of Apo-GalNAc-T1, and substrate bound GalNAc-T1 are superposable, suggesting that Muc5AC-13 glycopeptide binding does not greatly alter the overall lectin domain conformation.

Fig. S5 .
Fig. S5.Mass spectrometry data showing the identification of Thr15 as the only glycosylation site on Muc1 peptide after treatment with WT or variants of GalNAc-T1.

Fig
Fig. S9.(A) Snapshot at the end of the dynamics simulation showing the di-glycosylated Muc1 with the two GalNAc bound to their respective pockets in the a and b repeats of WT GalNAc-T1 (left: ribbon representation; catalytic domain colored green; lectin domain, blue.Right: the same snapshot showing the molecular surface of GalNAc-T1, with the deep peptide-wrapping crevice at the lectin/catalytic domain boundary that further stabilizes the complex).(B) Snapshots at equal time intervals along the 30-ns trajectories of two simulations, one in which the two GalNAc remains bound to their respective sites and one in which the C-term GalNAc becomes detached.(C) Proposed kinetic model based on the simulations and experimental results; all the proposed paths are likely to co-exist, with mutations modulating each; expression for kcat and KM can be formally derived from the corresponding kinetic equations after suitable approximations (e.g., MM assumptions); although other steps known to be involved, e.g., opening/closing of activation loop and detachment of UDP, are implicit in the diagram but can be explicitly factored in.(D) Definitions of each of the steps depicted in (C).

Table S1 .
Enzyme kinetics for the Muc5AC-A peptides and GalNAc-T1 variants

Table S2 .
Crystal Diffraction and Refinement Data

Table S3 .
Enzyme Kinetics for the Muc1 Peptides and GalNAc-T1